### Replication Package Overview

This package reproduces the results reported in the manuscript. The `run.R` script executes all necessary code in order. 

NOTE: To reproduce figures and tables exactly as they appear in the paper, the provided R scripts read the merged datasets from file. If you wish to rerun the record linkage procedure, set `from_file <- FALSE` in the `run.R` script.

#### File Structure

- `run.R`: Main script to reproduce all analyses
- `run.log`: Log file produced by `run.R`
- `code/`: Contains scripts executed by `run.R`
  - `001_link-candidates.R`: Performs record linkage from Application 1
  - `002_link-cities.R`: Performs record linkage from Application 2
  - `003_link-organizations.R`: Performs record linkage from Application 3
  - `004_link-parties.R`: Performs record linkage from Application 4
  - `005_table1.R`: Reproduces Table 1
  - `006_candidates.R`: Performs analyses reported in Section 3.1, computing precision and recall for fuzzylink and fastLink approaches
  - `007_cities.R`: Performs analyses reported in Section 3.2, reproducing Table 2
  - `008_organizations.R`: Performs analyses reported in Section 3.3, computing performance metrics for fuzzylink merge of organization names
  - `009_parties.R`: Performs analyses reported in Section 3.4, reproducing Table 3, Figure 1, Figure A8, and Table A4
  - `010_appendix.R`: Reproduces figures and tables from the paper's appendix
  - `acnet.R`: Functions to query Amicus Curiae Networks API (Abi-Hasan et al. 2023)
- `data/`: Contains intermediate results produced by the record linkage procedure if `from_file = FALSE` in `run.R` and hand-labeled record pairs produced by author and research assistants
- `raw/`: Contains raw data files merged in each application
  - `ceda.RData`: California Elections Data Archive (Sacramento State University, 2010-2022) combined into a single dataset by author.
  - `L2.RData`: Selection of observations from L2 voter file, retrieved March 2024.
  - `150k_plus_cities.csv`: A dataset containing the listed addresses of Paycheck Protection Program (PPP) recipients (Kaufman & Klevs 2022)
  - `uscities.csv`: A dataset containing a list of all incorprated US cities (2022).
  - `full_train_set.csv`: Training set for random forest model from Kaufman & Klevs (2022)
  - `parlgov_elections.csv`: ParlGov elections dataset (2023) <https://www.parlgov.org/data-info/>
  - `parlgov_countries.csv`: Dataset of country-adjective pairs appearing in the ParlGov dataset (e.g. Norway -> Norweigan). Author-created.
  - `dime_contributors_organizations_1979_2022.RData`: From Bonica (2023) "Database on Ideology, Money in Politics, and Elections" 
  - `acnet_scores_5_13_22.RData`: Merged dataset from Abi-Hassan et al. (2023)
- `figures/`: Contains programatically-generated manuscript figures
- `tables/`: Contains programatically-generated manuscript tables

#### Software and Environment

- R 4.5.0 with packages: tidyverse, fuzzylink, fastLink, glue, pROC, readxl, stringdist, stringmatch, tinytable, xtable
- OS: Microsoft Windows 10.0.19045
- Runtime: Less than 1 minute (if from file) or approximately 7 hours (if re-executing all record linkage procedures)
- Resources: 10 CPU cores, 16GB RAM, 0 GPU cores 

